Analysis of Variance
Department of Educational Psychology
Agenda
1 Overview and Introduction
2 Solutions for Assumption Violations
3 Correct for Violations in Data
4 Use Non-parametric Tests
5 Conclusion
In module 2, we discussed the intricacies of avoiding Type I and II errors in hypothesis testing, the connection to statistical power, and how assumption violations can create problems in Type II errors
In this module, students should be able to:
Agenda
1 Overview and Introduction
2 Solutions for Assumption Violations
3 Correct for Violations in Data
4 Use Non-parametric Tests
5 Conclusion
Without going into too much detail, we are concerned with how robust a test is, or how resilient a test is to assumption violations, and how well it works under less-than-ideal circumstances
Some researcher’s hold that many commonly used tests, i.e., t-tests are reasonably robust at baseline to assumption violations
Agenda
1 Overview and Introduction
2 Solutions for Assumption Violations
3 Correct for Violations in Data
4 Use Non-parametric Tests
5 Conclusion
There are several options for selectively transforming and or trimming our data to correct for certain patterns of skewness, kurtosis, or outliers
Making mathematical variable transformations is largely used to address the Normality Assumption, but maybe indirectly solve other issues as well
The exact transformation is dependent on the type of problem, specifically the skew:
Advantages:
Disadvantages:
Another option, particularly useful for negatively kurtotic (platykurtic) distributions (relatively flat distributions with an unusual number of observations in the tails) is to use variable trimming.
A trimmed sample is a sample where a fixed percentage of extreme values is removed from each tail.
Another related option is using winsorizing
Advantages:
Disadvantages:
Agenda
1 Overview and Introduction
2 Solutions for Assumption Violations
3 Correct for Violations in Data
4 Use Non-parametric Tests
5 Conclusion
Non-parametric test are distribution -free statistical tests that are not based on parameters (i.e., means or standard deviations) or assumptions about the normality underlying data distribution.
They do still have some assumptions, just not about normality! More on that in Assumptions of Non-parametric Tests
Instead, non-parametric tests are based on amounts such as percentages (Chi-square) or ranks, i.e., ordinal or ordinal -transformed data
Non-parametric tests are just as powerful as traditional tests, and under situations of violated assumptions can be much more powerful
We’ll discuss the Wilcox Rank Sum Test and the Mann Whitney U Test as they are non- parametric tests comparing the rank orderings of two independent samples
We’ll also cover the Wilcox Matched Pairs Signed Rank Test, sort of an analog to the dependent-samples t-test
Because these test use ranks, they can actually use ordinal data, unlike the t-test!
The Wilcox Rank Sum Test is based on the logic that if there truly is a significant difference between two groups, the ranks from one group should generally be lower than the ranks from the other group.
Following from that, if the groups are different, the sum of one group should be lower than the sum of the ranks from the other group.
| Group A Scores (\(n_1 = 4\)) | Group B Scores (\(n_2 = 5\)) |
|---|---|
| 85 | 70 |
| 92 | 82 |
| 88 | 75 |
| 95 | 78 |
| 80 |
| Score | Group | Rank |
|---|---|---|
| 70 | B | 1 |
| 75 | B | 2 |
| 78 | B | 3 |
| 80 | B | 4 |
| 82 | B | 5 |
| 85 | A | 6 |
| 88 | A | 7 |
| 92 | A | 8 |
| 95 | A | 9 |
| Group A Ranks | Group B Ranks |
|---|---|
| 6 | 1 |
| 7 | 2 |
| 8 | 3 |
| 9 | 4 |
| 5 | |
| Sum (\(W_A\)) = 30 | Sum (\(W_B\)) = 15 |
We start with the exact same steps as the Wilcox Rank Sum Test, calculating \(W_s\)
The we calculate the \(U\) statistic:
\[ U = \frac{n_1(n_1 + 2n_2 + 1)}{2} - W_s \]
| Participant | Before | After | Difference (\(d_i\)) |
|---|---|---|---|
| 1 | 120 | 115 | -5 |
| 2 | 135 | 130 | -5 |
| 3 | 110 | 120 | +10 |
| 4 | 145 | 145 | 0 |
| 5 | 130 | 110 | -20 |
| 6 | 125 | 122 | -3 |
| \(|d_i|\) | Absolute Rank | Sign |
|---|---|---|
| 3 | 1 | Negative |
| 5 | 2.5 | Negative |
| 5 | 2.5 | Negative |
| 10 | 4 | Positive |
| 20 | 5 | Negative |
| Absolute Rank | Positive Ranks (\(R^+\)) | Negative Ranks (\(R^-\)) |
|---|---|---|
| 1 | 1 | |
| 2.5 | 2.5 | |
| 2.5 | 2.5 | |
| 4 | 4 | |
| 5 | 5 | |
| Sum (\(W\)) | \(W^+ = 4\) | \(W^- = 11\) |
\[ r = \frac{z}{\sqrt{n}} \]
Agenda
1 Overview and Introduction
2 Solutions for Assumption Violations
3 Correct for Violations in Data
4 Use Non-parametric Tests
5 Conclusion
We have now discussed various methods we may pursue when dealing with oddities in our data that result in assumption violations
Some of these strategies involve carefully modifying our data with transforming, trimming, or winsorizing; some strategies involve alternative selection of tests; and then finally, we may do nothing (mindfully)
Each of our options result in some new things to be careful about, like transformation changing the scale of our data, trimming removing data, or non-parametric tests changing interpretation of results
Our discussion of the non-parametric analogs for the t-test will help lead us into a similar discussion on alternative tests to the one-way ANOVA (and others) later in the semester
Paramount to all of these things, we must be mindful of transparency when working through assumption problems
Module 3 Lecture - Transformations and Non-parametric Comparisons for Two Groups || Analysis of Variance